* .do file that generates cut down QoG datafile for merging into vdem

***Set directory to the Replicationgovernance\finkelextension folder

cd "XXX\ReplicationGovernance\Finkelextension"

*use qog---***NOTE THAT THIS DATA IS NOT IN THE REPLICATION PACKAGE DUE TO ITS LARGE SIZE.  You MUST download the QOG 2020 standard time series dataset if you wish to execute this analysis from the core data merge and clean seamlessly***

use "Updated QoG\qog_std_ts_jan20.dta"

*rename 3 letter code variable vdem has been keyed total

rename ccodealp country_text_id

*keep only a few variables that we need as controls

keep cname country_text_id year wdi_gdpcapgr wdi_pop wdi_gdppppcon2011 al_ethnic2000 wdi_incsh20h wdi_gini

*generate population average, make single value for all observations

sort country_text_id
by country_text_id: egen pop20022018average=mean(wdi_pop) if year>2001 & year<2019
by country_text_id: egen pop20022018averagemax=max(pop20022018average)

*generate average wealth 2002-2018 as static variables

sort country_text_id
by country_text_id: egen gdp20022018average=mean(wdi_gdppppcon2011) if year>2001 & year<2019
by country_text_id: egen gdp20022018averagemax=max(gdp20022018average)

*generate average top 20% share, GINI---go further back to maximize coverage (1995)

sort country_text_id
by country_text_id: egen gini19952018average=mean(wdi_gini) if year>1994 & year<2019
by country_text_id: egen gini19952018averagemax=max(gini19952018average)

sort country_text_id
by country_text_id: egen toptwenty19952018average=mean(wdi_incsh20h) if year>1994 & year<2019
by country_text_id: egen toptwenty19952018averagemax=max(toptwenty19952018average)

*drop unnecessary years

drop if year<2001 | year>2018

*fill in 2017 and 2018 with the ethnic fractionalization from 2000

by country_text_id: egen al_ethnic2000max=max(al_ethnic2000)

*drop variables that are not full rank static in Finkel et al, not necessary

drop al_ethnic2000 wdi_gdppppcon2011 wdi_gini wdi_incsh20h wdi_pop pop20022018average gdp20022018average gini19952018average toptwenty19952018average

*get rid of some nonsense observations due to use the 3 letter code

drop if cname=="Germany, West"
drop if cname=="Cyprus (-1974)"
drop if cname=="Ethiopia (-1992)"
drop if cname=="France (-1962)"
drop if cname=="Malaysia (-1965)"
drop if cname=="Sudan (-2011)" & year>2011
drop if cname=="Sudan (2012-)" & year<2012
drop if cname=="Pakistan (-1970)"
drop if cname=="Vietnam, North"
drop if cname=="Yemen, North"

*Save so that it can be merged into vdem v9.

save "QoGdata20012018.dta", replace







